Model Selection

Vision-Language Alignment

# Vision-Language Alignment

Internvl3 14B Instruct GGUF

InternVL3-14B-Instruct is an advanced Multimodal Large Language Model (MLLM) that demonstrates exceptional multimodal perception and reasoning capabilities, supporting various tasks such as tool usage, GUI agents, industrial image analysis, and 3D visual perception.

A fine-tuned version based on the lmms-lab/llava-onevision-qwen2-7b-ov model, supporting video-text-to-text conversion tasks.

Vit Bart Image Captioner

A vision-language model based on BART-Large and ViT for generating English descriptions of images.

Image-to-Text English

TITAN is a multimodal whole slide foundation model pre-trained through visual self-supervised learning and vision-language alignment for pathology image analysis.

Multimodal Fusion

Safetensors English

LLM2CLIP Llama 3 8B Instruct CC Finetuned

LLM2CLIP is an innovative approach that enhances CLIP's cross-modal capabilities through large language models, significantly improving the discriminative power of visual and text representations.

Multimodal Fusion

IP Adapter Instruct

IP-Adapter-Instruct is an image-to-image transformation model focused on instruction-guided image editing and generation tasks.

Image Generation English

Cambrian is an open-source multimodal LLM (Large Language Model) designed with a vision-centric approach.

Libra is a decoupled vision system built upon large language models, possessing fundamental multimodal understanding capabilities.

CLIP ViT B 16 CommonPool.L.image S1b B8k

A vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks

CLIP is a multimodal vision-language model capable of mapping images and text into a shared embedding space, enabling zero-shot image classification and cross-modal retrieval.

M BERT Distil 40

A model based on distilbert-base-multilingual, fine-tuned to align the embedding space for 40 languages, matching the embedding space of CLIP text encoder.

Transformers Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase